Skip to content

autotailor: Add validation for variable names, rule IDs, and profile IDs #2342

Merged
jan-cerny merged 3 commits intoOpenSCAP:mainfrom
ggbecker:autotailor-validate-datastream-fields
Apr 27, 2026
Merged

autotailor: Add validation for variable names, rule IDs, and profile IDs #2342
jan-cerny merged 3 commits intoOpenSCAP:mainfrom
ggbecker:autotailor-validate-datastream-fields

Conversation

@ggbecker
Copy link
Copy Markdown
Member

Add DataStreamValidator class that validates all IDs against the SCAP
datastream before generating tailoring XML. This prevents silent failures
from invalid variable names, rule IDs, group IDs, or profile IDs.

Key features:
- Parses datastream to extract valid profiles, values, rules, and groups
- Validates IDs before use in Profile and Tailoring classes
- Provides fuzzy matching suggestions for typos using difflib
- Generates clear error messages with suggestions
- Add --no-validate flag for performance-critical use cases

Performance:
- ~227ms overhead on 20MB datastream (validation enabled by default)
- ~33ms with --no-validate flag (7x faster)
- Validation prevents compliance drift and silent failures

Fixes issue where autotailor accepted arbitrary variable names without
validation, creating invalid XML that fails at evaluation time.
Add comprehensive unit tests for the new validation feature:

- test_datastream_validator: Tests validator with valid and invalid IDs
  for profiles, values, rules, and groups

- test_profile_with_validator: Tests Profile class integration with
  validator, ensuring invalid IDs are rejected

- test_validator_suggestions: Tests fuzzy matching suggestions for
  typos in ID names

All tests pass and verify that:
- Valid IDs are accepted
- Invalid IDs are rejected with clear error messages
- Similar valid IDs are suggested for typos
- Validation integrates properly with Profile class
Comment thread utils/autotailor Outdated
root = tree.getroot()

# Register namespaces
namespaces = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a global variable. It could be used all over the script.

Comment thread utils/autotailor
self.group_ids = set()
self._parse_datastream()

def _parse_datastream(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this method to more smaller methods.

Comment thread utils/autotailor Outdated
Comment on lines +613 to +614
help="Skip validation of IDs against the datastream. This significantly speeds up "
"execution on large datastreams but may produce invalid tailoring files if incorrect "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Skip validation of IDs against the datastream. This significantly speeds up "
"execution on large datastreams but may produce invalid tailoring files if incorrect "
help="Skip validation of IDs against the data stream. This significantly speeds up "
"execution on large data streams but may produce invalid tailoring files if incorrect "

Comment thread utils/autotailor Outdated
return msg

def validate_profile(self, profile_id):
"""Validate a profile ID exists in the datastream."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a "data stream", never a "datastream" or even something worse like "DataStream".

Comment thread utils/autotailor
if args.json_tailoring:
t.import_json_tailoring(args.json_tailoring)
try:
t.import_json_tailoring(args.json_tailoring)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the JSON tailoring also validated?

Comment thread utils/autotailor
help="Use local path for the benchmark href instead of absolute file:// URI. "
"Absolute paths are converted to basename, relative paths are preserved.")
parser.add_argument(
"--no-validate", action="store_true",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this option to the man page.

Comment thread utils/autotailor
if profile_id not in self.profile_ids:
raise ValueError(self._create_validation_error("Profile", profile_id, self.profile_ids))

def validate_value(self, value_id):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be amazing if the tool would also verify validity of selectors used in the -V CLI option.

For example:

utils/autotailor -V xccdf_org.ssgproject.content_value_var_selinux_state=OOO /usr/share/xml/scap/ssg/content/ssg-fedora-ds.xml cis

This command should error because the XCCDF Value doesn't contain selector OOO. But currently it isn't checked and the command doesn't error and creates invalid tailoring.

- Move XML namespace dict to module-level DS_NAMESPACES constant
- Split _parse_datastream into _extract_profiles/values/rules/groups
- Add selector validation for -V/--var-select option
- Fix terminology: "datastream" -> "data stream" in all user-facing text
- Add --no-validate option to the man page
@sonarqubecloud
Copy link
Copy Markdown

@ggbecker
Copy link
Copy Markdown
Member Author

@jan-cerny I've addressed the feedback on ef1d59d

Copy link
Copy Markdown
Member

@jan-cerny jan-cerny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried various combinations of parameters (selections, unselections, variables, selectors) and verified that they are validated against the data stream. I also used the --no-validate option. Tests passed both locally and in CI.

@jan-cerny jan-cerny merged commit e90fa55 into OpenSCAP:main Apr 27, 2026
21 checks passed
@jan-cerny jan-cerny added this to the 1.4.5 milestone Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants